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while  for  reconstruction  tasks,  the  thresholded  posterior  mean  has  the 
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1.  Introduction. 


A  very  powerful  approach  for  solving  computational  early  vision  problems  is 
to  formulate  them  as  estimation  problems.  The  estimates  are  based  on  probabilistic 
models  that  embody  both  our  prior  generic  knowledge  about  the  behavior  of 
the  solution,  as  well  as  the  process  by  which  the  observations  are  obtained.  In 
particular,  various  researchers  have  used  Bayesian  estimation  and  Markov  Random 
Field  (MRF)  models  to  perform  image  segmentation  (Elliot  et.  al.  (1983);  Hansen 
and  Elliot  (1982);  Geman  and  Geman  (1983);  Marroquin  (1984b);  Cohen  and 
Cooper  (1984))  and  surface  reconstruction  tasks  (Marroquin  (1984a)).  Most  of  these 
researchers  have  assumed  that  the  Maximum  a  Posteriori  (MAP)  estimate,  which 
is  defined  as  the  configuration  which  maximizes  the  posterior  distribution,  provides 
the  best  possible  solution  to  these  problems,  and  several  methods  (such  as  dynamic 
programming  and  "simulated  annealing"  (Kirkpatrick,  et.  al.  (1983)),  a  form  of 
stochastic  relaxation)  have  been  proposed  to  compute  it. 

Let  us  consider,  however,  the  example  portrayed  in  figure  1. 

Panel  (c)  represents  the  MAP  estimate  of  the  binary  MRF  (a)  (whose  parameters 
are  assumed  to  be  known)  from  the  noisy  observations  (b).  It  is  clear  that  the 
estimates  shown  in  (d)  and  (e)  (which  we  will  describe  later)  are  better  than  the  MAP 
estimate  from  almost  any  viewpoint  This  suggests  the  use  of  criteria  other  than 
the  value  of  the  posterior  probability  to  measure  the  performance  of  an  estimator. 
In  particular,  we  propose  the  use  of  the  following  criteria  for  the  two  classes  of 
problems  mentioned  above: 

1.1.  Error  Criterion  for  the  Segmentation  Problem. 


Consider  a  field  x  with  N  elements  each  of  which  can  belong  to  one  of  a  finite 
set  Q,  of  classes.  Let  x,  denote  the  class  to  which  the  ith  element  belongs.  The 
segmentation  problem  is  to  estimate  x  from  a  set  of  observations  (yi, . .  .,yp}.  Note 
that  x,  does  not  necessarily  correspond  to  the  image  intensity.  It  may  represent,  for 
example,  the  texture  class  for  a  region  in  the  image  (as  in  Elliot  et  al.,  1983),  etc. 

A  reasonable  criterion  for  the  performance  of  an  estimate  x  is  the  number  of 
elements  that  are  not  classified  correctly.  Therefore,  we  define  the  segmentation 
error  e,  as: 

N 

e,(x,  x)  =  £  (i  -  S(xi  -  i ,))  ,  x,-,  x,-  €  Qi  (1) 

«wi 


where 


x(a)  _  /!»  =  0 

'  10,  otherwise 


(2) 


Figure  1.  (a)  Sample  function  of  a  binary  MRF.  (b)  Output  of  a  binary  symmetric  channel 
(error  rate:  0.4)  (c)  MAP  estimate,  (d)  Monte  Carlo  approximation  to  the  MPM  estimate,  (e) 
Deterministic  approximation  to  the  MPM  estimate. 


1.2.  Error  Criterion  for  the  Reconstruction  Problem. 


In  this  case,  we  also  consider  a  field  z  with  N  elements  which  can  take  values 
on  finite  sets  {£,},  but  now  we  assume  specifically  that  i,  represents  the  intensity 
of  an  image  (or  the  height  of  a  surface)  at  site  ».  This  suggests  that  an  estimate  x 
should  be  considered  "good"  if  it  is  close  to  x  in  the  ordinary  sense,  so  that  the 
total  squared  error: 

er(z,z)  =.£(*,• -z,)2  (3) 


will  be  a  reasonable  measure  for  its  performance. 

Let  us  now  derive  the  optima!  estimators  for  these  error  criteria. 


2.  Oplima!  Bayesian  Estimators. 


In  this  section  we  introduce  two  estimators  based  on  the  posterior  distribution 
Px |y:  The  Maximizer  of  the  posterior  marginals  xmpm ,  which  is  obtained  by 
maximizing  separately  the  posterior  marginal  distribution  of  each  element,  and  the 
Thresholded  posterior  mean  xjpm-  Their  formal  definition  is  as  follows: 

XMPM{i)  =  q  ’  A|y(?;y)=  sup{F,|„(r;y)}  (4) 

r€Qi 

where  Pt|y  is  the  posterior  marginal  distribution  of  the  ith  element: 

Pi\v[q'>y)=  £  px\ v{x;y)  (5) 


The  TPM  estimate  is: 


xTpM[i)  =  q  ■  {q  -  *i?  =  mf{(r-i,)2} 

rtQi 


(6) 


where  x  is  the  posterior  mean: 


2  =  £^*|»(*;y)  (7) 

X 

We  will  show  that  the  MPM  and  TPM  estimators  are  optimal  with  respect  to  the 
error  criteria  (1)  and  (3)  respectively.  In  what  follows,  we  will  assume  that  the  prior 
and  conditional  distributions  Px,  Py (l  are  known,  so  that  the  posterior  distribution 
can  be  obtained  from  Bayes  rule. 

Proposition  1:  xmpm  is  optimal  with  respect  to  the  segmentation  error  c,. 

This  is  not  a  new  result  (see  for  example  Abend,  1968);  however,  we  present  here 
a  simple  proof: 

Let  x  =  xmpm » and  let  z  denote  any  other  estimate  for  x.  We  will  show  that 

E[c,(x,x)}  <  E[e,(x,z)] 

where  the  expectation  is  taken  over  all  x  and  all  possible  observations  y. 

For  any  fixed  y  we  have,  from  (1): 

e,(x  |  y)  =  E[e.(x,  x)  1  y]  =  £  £(i  “  *(*»’  “  4»))-P*|»(i:  v) 

*  «— 1 


=  E  £  px\v{*\  y)  =  £  (i  -  Pav&u  y)) 

»“1  »“1 


with  P,j„  given  by  (5).  Also, 


N 


*=1 


Therefore, 


N 


*•[*  I  y)  -  Mz  I  y)  =  -  E  Ip»i»(i.-;  y)  -  pi\yfa\  y)] 

*=i 


(8) 


But  by  the  definition  (4)  of  x,  each  term  on  the  right  hand  side  of  (8)  is  non-positive, 
so  that 

e3(x  |  y)  —  ta{z  |  y)  <  0  for  all  y 

and  therefore, 

E[e,(x,x)]  <  E[c4(x,x)]  I 


Note  that  we  have  not  made  any  assumption  about  Px,  Py or  {Q,},  so  that 
this  is  a  completely  general  result  In  particular,  the  sets  of  valid  classes  Q,  do  not 
have  to  be  the  same  for  all  i ,  so  that  this  result  holds  if,  as  in  Geman  and  Geman 
(1983),  x  is  the  pair  {/,  /},  where  /  denotes  the  intensity  of  a  piecewise  constant 
image,  and  l  the  boundaries  between  uniform  regions. 

Proposition  2:  xtpm  is  optimal  with  respect  to  the  reconstruction  error  er. 

Proof:  Again,  we  put  x  =  xTPM,  and  let  z  denote  any  other  estimate.  For  any  fixed 
y  we  have,  from  (3): 


M2  I  y)  =  E  E  fa  -  *i)2p*ivfa  y) 

x  »=1 


where 


N 


-  E  E(z?  -  2xfa  +  x2)p*\vfay) 

t— 1  z 


N 


=  EK2*-**)2 +*?-*<] 

»—i 

*i  =  Ex<F*ivfav) 


also, 


xi=Ex<Pzlyfav) 


N 


M*  I  y)  =  E  K2*-  ~  fa)2  +  -  2<] 
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so  that 


cr(£  |  y)  -  er(z  |  y)  —  £  [(*«  ~  *i?  ~  (*»  “  zi?\  <  0 

i=i 

from  the  definition  (6)  of  x  i 

Remarks: 

1.  In  the  limit  of  fine  quantization  the  variables  i,  become  continuous  (i.e.,  Q,  =R 
for  all  i).  In  this  case  we  recover  the  classical  result  that  establishes  that  the  posterior 
mean  minimizes  the  mean  squared  error;  for  the  particular  case  of  Gaussian  (Habibi, 
1972;  Nahi  and  Assefi,  1972)  or  isotropic  fields  (Levy  and  Tsitsiklis,  1983),  it  is  then 
possible  to  construct  efficient  algorithms  that  compute  this  quantity. 

1  The  surface  reconstruction  problem  treated  by  Marroquin  (1984a)  can  be 
considered  a  particular  instance  of  the  reconstruction  problem  defined  here.  If  we 
let  i  be  the  pair  {/,/},  where  /  is  the  (continuous  valued)  depth  process  and  /  is 
a  binary  line  process,  the  optimal  estimator  will  be  xtpm  (note  that  for  a  binary 
variable,  xtpm[i)  —  xMPM(i)). 

3.  These  results  can  also  be  applied  to  the  case  of  an  m-ary  line  process  (like  the 
one  suggested  by  Geman  and  Geman  (1983))  by  defining  an  appropriate  mixed 
error: 

*»(/.  i,  /,  i)  -  £  (/.-  %? + x  E  (i  -  <(1.  - '«) 

*«1  *«-i 

with  6  as  defined  in  equation  (2).  One  can  show,  using  arguments  similar  to  those 
of  the  proofs  above,  that  the  optimal  estimate,  for  any  positive  value  of  X  will 
be  OtpmJmpm}-  Note  that  the  posterior  mean  for  /  and  the  posterior  marginal 
distributions  for  /  must  be  computed  by  averaging  over  all  possible  values  of  both 
/  and  l: 

/  / 

=  E  E  *WMv) 

/  **-! 


3.  Algorithms. 

As  in  the  case  of  the  MAP  estimate,  the  exact  computation  of  the  optimal  MPM 
and  TPM  estimates  is  a  formidable  task,  except  for  very  small  N.  It  is  possible, 
however,  to  define  general  distributed  Monte  Carlo  procedures  that  will  allow  us  to 
approximate  these  values  as  precisely  as  we  want 

The  general  idea  is  to  use  the  Metropolis  algorithm  (Metropolis,  1953)  to 
generate  global  configurations  of  the  field  x  which  are,  asymptotically,  distributed 


5 


according  to  Px  |  y.  Since  the  Markov  chain  defined  by  the  Metropolis  algorithm  is 
known  to  be  ergodic  (Geman  and  Geman,  1983),  we  can  approximate  x  by: 


(9) 


and  the  posterior  marginals  by: 


pf|V(g) »  r~r  E  ~  ?)  (10) 

K  n  t—k 

where  is  the  configuration  generated  by  the  Metropolis  algorithm  at  time  t,  and 
k  is  the  time  required  for  the  system  to  be  in  thermal  equilibrium.  From  these 
values,  xMPM  and  &tpm  can  be  easily  computed  using  (4)  and  (6). 

If  we  compare  this  procedure  with  the  use  of  "Simulated  annealing”  (Kirkpatrick, 
et.al.,  1983)  for  approximating  the  MAP  estimate,  we  note  that  even  from  a 
computational  viewpoint  it  has  some  distinct  advantages: 

1.  It  is  difficult  to  determine  in  general  the  descent  rate  of  the  temperature 
(annealing  schedule)  that  will  guarantee  the  convergence  of  the  annealing 
process  in  a  reasonable  time  (it  usually  involves  a  trial  and  error  procedure). 
Since  we  are  running  the  Metropolis  algorithm  at  a  fixed  temperature  (T  —  l), 
this  issue  becomes  irrelevant 

2.  Since  in  our  case  we  are  using  a  Monte  Carlo  procedure  to  approximate 
the  values  of  some  integrals,  we  should  expect  a  nice  convergence  behavior,  in 
the  sense  that  coarse  approximations  can  be  computed  very  rapidly,  and  then 
refined  to  an  arbitrary  precision. 

The  main  disadvantage  of  this  procedure  is  that  in  the  case  of  the  segmentation 
problem,  a  large  amount  of  memory  might  be  required  if  the  number  of  classes 
per  element  m  is  large  (we  need  to  store  the  N(m  - 1)  numbers  that  define  the 
posterior  marginals). 

With  respect  to  the  relative  performance,  we  point  out  that  for  high  signal  to 
noise  ratios,  the  MAP  estimate  is  usually  close  to  the  optimal  one  (which  explains  its 
good  performance  in  many  cases).  If  the  noise  level  is  high,  however,  the  difference 
in  the  performances  of  the  two  estimators  may  be  dramatic  (see  fig.  1). 

Finally,  from  a  conceptual  viewpoint,  the  definition  of  the  solution  of  the 
estimation  problem  in  terms  of  the  posterior  mean  (or  the  posterior  marginals) 
may  lead  in  some  cases  to  the  construction  of  efficient  deterministic  algorithms  for 
approximating  the  solution.  To  make  these  ideas  concrete,  we  now  analyze  in  detail 
a  specific  instance  of  the  segmentation  problem. 


4.  Optimal  Segmentation  of  Dinar)  MRFs. 


In  this  section  we  will  consider  the  following  particular  example  of  the  general 
segmentation  problem: 

Let  /  represent  a  binary  pattern  on  a  rectangular  lattice  L  with  N  elements, 
which  is  a  particular  realization  of  a  Markov  Random  Field  with  potentials  given 
by: 


r-l,  if  U  =  fj  and  ||t- i||  e  (0,1] 

V(fi,  fj)  =1,  if  fiji  U  and  ||i  - ;||  €  (0,  l] 

lo,  otherwise 

where  t  and  j  are  sites  of  L  and  ||*  -  i||  is  the  distance  betweeen  i  and  j  (i.e.,  / 
is  a  realization  of  a  two-dimensional  lsing  net).  Without  loss  of  generality,  we  will 
assume  that  /,•  €  {0, 1}  for  all  i  €  L. 

The  probability  density  of  the  configurations  F  is  given  by: 

Pr(f  =  /)=!  «p|-itf(/)]  (11) 

where  Z  is  a  normalizing  constant;  the  energy  U{f)  is: 

d(/)=  e  niiJi) 


where  i,j  €  Nn  means  that  t  and  j  are  nearest  neighbors,  and  T0  is  the  natural 
temperature  of  the  system. 

Suppose  that  /  is  sent  through  a  binary  symmetric  channel  with  error  rate  e, 
so  that  the  conditional  distribution  of  the  observations  g  is: 


i /<)-{£  £)| 


if  9i  =  Si 

if  si  7^  Si  i  Si,  Si  €  {o,  1} 


The  posterior  distribution  is: 


Pf\,(S-,g) 


(12) 


where  ZP  is  a  constant,  and  the  posterior  energy  Up  is: 

vp  -  i  e  nu,  n ) + °  E(i  -  M  -  *» 

lo  ij  i 


(13) 


Figure  2.  Ratio  of  the  average  errors  of  the  MAP  and  MPM  estimators  for  a  2  x  2  Ising  net. 
4.1.  Error  Analysis. 

Using  equations  (1)  and  (12),  it  is  not  difficult  to  write  down  the  exact  expressions 
for  the  average  segmentation  error  corresponding  to  die  MAP  and  MPM  estimators 
for  this  problem.  However,  the  numerical  evaluation  of  these  expressions  is  feasible 
only  for  small  values  of  N. 

In  figure  2  we  show  a  plot  of  the  ratio: 

__  E\*map) 
e[*mpm) 


for  a  2  x  2  lattice,  for  different  values  of  the  error  rate  t  and  the  natural  temperature 
T0.  As  expected,  r  is  never  less  than  1.  In  the  worst  case  (for  e  =  0.1  and  To  =  0.2) 
the  error  of  the  MAP  estimate  is  1.17  times  that  of  the  MPM  estimate;  if  7b  is  not 


too  small  and  t  is  not  too  large,  both  estimates  coincide,  and  as  t  approaches  0.5 
(low  signal  to  noise  ratio),  the  MPM  estimate  is  consistently  better  than  the  MAP. 
An  experimental  analysis  of  larger  lattices  reveals  a  similar  qualitative  behavior,  but 
the  values  of  r  are  much  larger  in  this  case. 

4.2.  Example. 

We  now  return  to  the  example  presented  in  figure  1,  and  examine  it  in  more 
detail.  Panel  (a)  represents  a  typical  realization  of  a  64  X  64  lsing  net  with  free 
boundaries,  using  a  value  of  To  =  1.74  (0.75  times  the  critical  temperature  of  the 
lattice);  panel  (b),  the  output  of  a  binary  symmetric  channel  with  error  rate  e  —  0.4; 
panel  (c)  the  MAP  estimate,  and  panel  (d)  an  approximation  to  the  MPM  estimate 
(which  we  will  label  "MPM  (M.C.)")  obtained  using  the  Metropolis  algorithm  and 
equation  (10)  to  estimate  the  posterior  density.  The  corresponding  values  of  the 
posterior  energy  Up  (equation  (13))  and  the  relative  segmentation  error  (e,/642)  are 
shown  on  table  1. 


Table  1 

/ 

9 

* 

/map 

/mpm^-^-) 

Energy 

-5594.8 

-226.0 

-6660.9 

-6460.0 

-6427.0 

Seg.  Error 

— 

0.4 

0.33 

0.128 

0.124 

4.3.  A  Fast  Algorithm.  * 

For  this  problem,  it  is  possible  to  construct  a  fast  deterministic  algorithm  whose 
experimental  performance  (in  terms  of  the  average  segmentation  error)  is  equivalent 
to  the  Monte  Carlo  method  discussed  above.  It  is  based  on  the  following  ideas: 

First,  we  note  that  for  a  binary  pattern,  the  MPM  and  TPM  estimates  coincide. 
We  will  approximate  the  posterior  mean  of  (12)  by  that  of  a  Gaussian  distribution 
Pg  with  the  property: 

PG(h)  =  for  all*  e  {0,1}. 

Zp 

In  particular,  we  use: 

PgW  =  exp  [~  Jr  £  (h*  M2  “  <*  U**'  ~  «•')*]• 

For  this  distribution,  h,  is  the  (unique)  minimizer  of  the  convex  function: 

UaW  =  ~ 

‘0 


which  corresponds  to  the  unique  fixed  point  of  the  system: 

(*+,)  +  aTogj 

*  m  +  aTQ 


(14) 


where 

Ni  =  {jeL  :  ||t-y||  =  i}. 
We  could  now  approximate  our  estimate  by  putting: 

.* 

/*•  -  ©(M 


where 


if*>  i 

otherwise 


(15) 


There  is  an  additional  consistency  condition  that  f  must  satisfy,  however.  It  can 
be  shown  that  when  the  posterior  distribution  has  the  form  given  by  (12)  and  (13), 
the  MPM  estimate  /,  which  by  definition  satisfies: 


^i\g(fi>9)  >  -f»)®((l  /»')>  ff) 


also  satisfies: 


Pi\B{hf)>Pi\M  -/,);/) 


(16) 


which  means  that  if  we  replace  the  observations  by  the  MPM  estimate,  and  compute 
a  new  MPM  estimate  for  this  modified  problem,  we  should  get  the  same  result 

Translating  this  condition  to  the  case  of  }  ,  we  get  that  it  must  satisfy: 


h  =  ©(O 


(17) 


h:  = 


where  h *  satisfies: 

SyeAT.  hj  +  qTqQ(\) 
\Ni\  +  aT0 

In  practice,  we  get  h*  as  the  fixed  point  of  the  system: 

;*(*) 


.•(HI)  V"  +  “W?1) 

'  |W(|  +  «r0 


(18) 


with 


=  h 
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*  • 


Note  that  the  function: 

C4M  =  E  (*.-  M2  +  E(Ai  -  e(fi,))! 

*j‘€M  » 


v 

« 


acts  as  a  Lyapunov  function  for  the  system  (18),  which  is  therefore  (locally)  stable 
(Vidyasagar,  1978). 

This  algorithm  can  be  visualized  as  operating  in  two  steps:  In  the  first  one, 
we  extract  all  the  information  that  we  need  from  the  observations  and  encode  it  in 
h  (which  is  continuous-vaiued),  and  in  the  second  one,  we  find  the  closest  binary 
pattern  that  satisfies  the  consistency  condition  (16). 

To  illustrate  the  performance  of  this  approximation,  we  show  f  ,  for  the 
example  discussed  above,  in  panel  (e)  of  figure  1,  and  its  corresponding  energy  and 
segmentation  error  in  the  last  column  of  table  1  (labeled  "MPM  deL"). 


4.4.  Dimensionality  of  the  Parameter  Space. 


ES 


I 


* 


Equations  (14)  and  (18),  which  describe  the  deterministic  approximations  to 
Smpm  depend  on  the  parameters  of  the  system,  e  and  T0,  only  through  the  product: 


T  =  ar0  =  r0iog(^-i) 


(19) 


which  means  that  the  behaviour  of  the  algorithm  is  completely  characterized  by  the 
single  parameter  7.  In  the  case  of  the  Monte  Carlo  approximation  (10),  if  we  fix 
the  value  of  7,  the  value  of  Tq  cannot  be  chosen  arbitrarily,  since  it  has  to  satisfy 
the  consistency  condition: 


r0  =  ? 

a 


with 


i=l0*(Lr)  ; 


N 


;v  «-l 


where  z  is  the  residual  process  defined  as: 

=  I1’ 

(0,  otherwise 


(20) 


(21) 


This  means  that,  given  7,  the  correct  value  of  To  can,  in  principle,  be  determined 
in  an  adaptive  way,  so  that  in  this  case  loo,  the  behaviour  of  the  approximation 
depends  effectively  only  on  7. 

5.  Simultaneous  Estimation  of  the  Field  and  the  Parameters. 


The  above  considerations  suggest  that  we  can  estimate  simultaneously  the 
original  field  and  the  parameters  of  the  system  by  looking  for  the  supremum  of  an 
appropriate  likelihood  function  over  the  one-dimensional  subspace  parametrized 
by  7.  We  proceed  as  follows: 

For  a  given  value  of  7,  we  can  approximate  the  corresponding  MPM  estimate 
/  using  the  methods  developed  in  the  previous  section,  and  compute  the  residual 
process  z  and  the  conditional  (on  7)  Maximum  Likelihood  Estimate  of  the  error 
rate  c  using  equations  (20)  and  (21).  The  corresponding  conditional  estimate  for  To 
will  be: 

fro  -  l  (22) 

To  measure  the  "likelihood"  of  the  estimate  f,  we  use  the  degree  of  uniformity 
(or  "whiteness")  of  the  residual  process  z.  This  property  can  be  quantified  by  the 
variance  of  the  local  noise  density,  which  we  estimate  as  follows: 

We  cover  the  lattice  with  a  set  {Sj}  of  m  non-overlapping  squares  (say,  8 
pixels  wide).  For  each  square  Sj,  the  relative  variance  of  the  noise  density  is: 


where  |Sj|  is  the  area  of  the  jth  square. 

The  desired  likelihood  function  is: 

LO)  =  -  £>;  (24) 

y-i 


Note  that  a  more  conventional  likelihood  function,  such  as  the  conditional 
likelihood  proposed  by  Besag  (1972),  will  not  work  in  this  case;  this  function  is 
defined  as: 


L{jy.  Li  (/) +  !»(/) 


with 


i*(h-  n  pOi\Ji,j€NiX)= 

*'€C* 
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«Pi-£E,€*,V(/,j,.)] 

«?[-£  EieN,  V(u’)\  +  «cp[- £  £*„,  V(1  -  /„/,)] 

=  n  (1  +otp|r-  E  v(/.-. />)])“'.  *:  =  1,2 

*€C,  r0  J'€M 

where  the  "codings"  C}  and  C2  are  sets  of  conditionally  independent  sites  defined 
as: 

C\  =  {z  :  (x,  is  odd  and  yt-  is  even)  or  (x,  is  even  and  yt-  is  odd)} 

C2  =  {*  :  (xt-  is  odd  and  y,is  odd)  or  (x,  is  even  and  yt-  is  even)} 

with  (x„y,)  denoting  the  row  and  column  indices  of  site  In  our  case,  we  find 
that  as  7  decreases,  f  becomes  more  and  more  uniform,  while  7b  remains  almost 
constant  It  is  not  difficult  to  see  that  as  a  result,  the  conditional  likelihood  L  will 
decrease  monotonically  with  7,  which  renders  it  useless  for  our  purpose. 

The  range  of  values  [70,  im]  of  the  parameter  7  that  corresponds  to  the  class 
of  systems  of  interest  can  be  determined  as  follows: 

One  can  show  that  for  7  >  8  we  will  always  have  Jmpmi  —  9i  for  all  z,  so  that 
we  can  use  —  8.  The  value  of  70  can  be  obtained  from  an  upper  bound  for  c 
and  a  lower  bound  for  T0.  For  example,  assuming  that  e  <  .45  and  T0  >  .57;,  we 
get  70  =  .23.  (Note  that  when  the  natural  temperature  To  of  a  first  order,  isotropic 
MRF  is  below  0.5  times  Tc  (the  critical  temperature  of  the  lattice;  see  Kindermann 
and  Snell,  1980),  the  patterns  become  practically  uniform  (i.e.,  /,■  =constant  for  all 
z),  while  for  values  of  7b  greater  than  1.57;,  we  get  patterns  that  are  practically 
indistinguishable  from  white  noise.  Therefore,  the  assumption  To  >  .57;  covers 
practically  all  the  interesting  cases). 

The  complete  estimation  procedure  is  as  follows: 

Maximum  Likelihood  Estimation  Algorithm: 

1:  Sample  the  interval  [70,  im]  at  the  points 

7o  <  7i>  •  •  -In  <  7 M 

2:  For  each  7  €  Q  —  {71.  •  •  -7n} : 

2.1:  Find  }{i)  using  (14)  and  (18). 

2.2:  Compute  z  using  (21). 

2.3:  Compute  e  using  (20).  If  i  =  0,  set  I(/(7))  =  -00  and  proceed  with  the 
next  value  of  7.  Otherwise,  compute  a  and  go  to  2.4. 

2.4:  Compute  To  using  (22). 


Figure  3.  (a)  Synthetic  image,  (b)  Noisy  observations,  (c)  Maximum  Likelihood  Estimate. 


2.5:  Compute  L (/(nr))  using  (23)  and  (24). 

3:  Compute  the  optimal  estimate  /  using: 

/  =/(7*)  :  =  »up  L{fy))  (25) 

The  corresponding  e ,  T0  will  be  the  optimal  estimates  for  c  and  T0,  respectively. 
Remarks: 

1.  This  estimation  algorithm  allows  us  to  reconstruct  a  binary  pattern  /  from 
the  noisy  observations  g  without  having  to  adjust  any  free  parameters.  The  only 
prior  assumptions  correspond  to  the  qualitative  structure  of  the  field  /  (first  order, 
isotropic  MRF)  and  to  the  nature  of  the  noise  process,  but  neither  the  natural 
temperature  Tq  nor  the  error  rate  e  have  to  be  known  in  advance.  In  practice,  this 
means  that  we  can  apply  it  to  restore  any  binary  image  with  uniform  granularity, 
even  if  it  has  not  been  generated  by  a  Markov  random  process.  We  have  used  this 
algorithm  to  reconstruct  a  variety  of  binary  images  with  excellent  results.  In  figure  3 
we  show  such  a  restoration.  The  observations  (b)  were  generated  from  the  synthetic 
image  (a)  with  an  actual  error  rate  of  .35  (assumed  unknown).  The  MLE  for  /  is 
shown  in  (c). 

2.  This  procedure  can  be  easily  extended  to  handle  any  one-parameter  noise 
corruption  process  (such  as  zero  mean,  additive  white  Gaussian  noise).  The  extension 


to  the  case  of  N-ary  fields,  i.e.,  to  the  restoration  of  piecewise  constant  images,  is 
also  straightforward  (using  the  general  algorithm  (10)  together  with  equation  (4) 
instead  of  (14)  and  (18)  in  step  2.1). 

3.  We  have  found  that  the  likelihood  function  (24)  is  reasonably  well  behaved  as 
a  function  of  7.  This  permits  us  to  perform  the  one-dimensional  search  for  its 
supremum  in  an  economical  way,  by  first  determining  its  approximate  location  using 
a  coarse  sampling  pattern,  and  then  refining  its  position  by  a  fine  sampling  of  a 
reduced  interval.  In  practice,  it  is  possible  to  get  very  good  results  using  on  the 
order  of  15  samples. 

4.  The  whole  procedure  is  highly  distributed,  so  that  it  is  possible  to  implement  it 
in  parallel  hardware  in  a  very  efficient  way. 

6.  Conclusions. 

A  very  fruitful  approach  to  the  solution  of  image  segmentation  and  surface 
reconstruction  tasks  is  their  formulation  as  estimation  problems,  via  the  use  of 
MRF  models  and  Bayes  theory.  However,  the  MAP  estimate,  which  has  been  often 
used,  will  not  always  give  the  best  solution.  We  have  shown  that,  for  segmentation 
problems,  the  optimal  Bayesian  estimator  is  the  maximizer  of  the  posterior  marginals 
(MPM),  while  for  reconstruction  tasks,  the  thresholded  posterior  mean  (TPM)  has 
the  best  possible  performance.  In  some  cases,  these  estimators  will  give  results 
that  are  similar  to  those  obtained  by  the  MAP  procedure,  but  in  many  interesting 
situations  (in  particular  for  low  signal  to  noise  ratios)  their  performance  will  be 
significantly  better. 

The  exact  computation  of  these  estimates  (including  the  MAP)  is  computationally 
unfeasible  in  most  practical  situations.  We  have  presented  a  general  Monte  Carlo 
procedure  which  approximates  their  value  as  precisely  as  desired  (at  the  expense  of 
increased  computational  cost).  This  procedure  can  be  easily  implemented  in  parallel 
hardware,  and  it  has  the  advantage  that  coarse  solutions  are  computed  very  fast, 
and  are  then  progressively  refined.  We  have  also  shown  how  in  particular  cases,  it  is 
possible  to  use  the  conceptual  framework  provided  by  these  estimators  to  construct 
fast  algorithms  with  very  good  experimental  performance. 

Finally,  we  have  developed  a  maximum  likelihood  procedure  for  the  simul¬ 
taneous  estimation  of  a  piecewise  constant  field  and  the  parameters  of  the  system. 
This  construction  leads  to  a  parameter-free  distributed  algorithm  for  reconstiucting 
piecewise  constant  images  from  noisy  observations. 
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